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Abstract: Missing value research has been around for at least two to three decades, but imputation of 
missing values is a major challenge in keeping databases intact. Statistics-oriented imputation and non- 
statistics-oriented imputation are two types of missing value imputation. Numerous flaws in the statistics-based 
imputation method make it difficult to fine-tune or expect perfect imputation; it also has a number of execution- 
related limitations. This is a clue to the non-statistical practise known as machine learning, which we examined 
in this work. The Deep Belief Network (DBN) is a type of unsupervised probabilistic generative model used in 
machine learning applications. Restricted Boltzmann Machines are used to build it, and they perform a 
contrastive divergence and backpropagation to fine-tune the weights for the imputation process. DBN's stable 
imputation value is based on the contrastive divergences. Data from the UCI Repository's PIMA medical dataset 
was used in the experimentation Up to 90% of the time, the DBM with backpropagation is accurate. A 
maximum of 10% mean square error rate is supported by this method (DBN) compared to earlier imputation 
techniques. In order to evaluate the accuracy of DBN, nearly five additional imputation methods are linked to it. 
In comparison to other methods, DBN imputation provides 90% accuracy. 


Keywords: Machine Learning, Unsupervised Learning, Deep Belief Network, Imputation, Artificial 
neural networks. 


Introduction 


There are numerous methods to handle missing data before data analysis [1]. If you're looking for a 
more robust agreement with missing data, a complete-case approach (listwise deletion) may be the 
best option [2]. If the omitted subjects are analytically different from those encompassed, it suggests 
that the analysis has less power and can introduce biases. Second, the mean of the available cases can 
be used to replace missing data [3-6]. When applied, it reduces data inconsistency and incorrectly 
estimates SDs and variances together. In a large survey-based or epidemiology study, missing data is a 
common procedural issue [7-12]. A regression model can be used to impute missing values by using 
data from other variables to calculate the value of a specific variable for which data are missing. Still, 
regression imputation overvalues the correlations between target and instructive variables and 
underestimates variances and covariances when used [13-16]. This tactic, known as hot-deck 
imputation in surveys, can be used to classify the non-respondents as those who share similarities and 
then impute missing data from those who are similar to the non-respondents [17-22]. This technique is 
used to calculate statistics for a population that differs from the one from which the data were 
collected, and it is called "converse probability weighting." An approximation of sampling probability 
can be used to increase the weight for subjects with a significant amount of missing data [23-27]. It is 
also possible to make use of multiple regression multiple imputations to fill in the blanks of missing 
data [28]. 


Published under an exclusive license by open access journals under Volume: 2 Issue: 7 in Jul-2022 
Copyright (c) 2022 Author (s). This is an open-access article distributed under the terms of Creative Commons Attribution 
License (CC BY).To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ 


29 


EE ee a E eee ee Eee 
| J | AET International Journal of Innovative 
Analyses and Emerging Technology 


| e-ISSN: 2792-4025 | http://openaccessjournals.eu | Volume: 2 Issue: 7 


The proposed imputation practises may also have an impact on study outcomes [29-33]. When the 
assumptions of missing completely at random (MCAR) and missing at random (MAR) hold, it has 
been shown that the effort soundness of different probability weighting and multiple imputations [34- 
37]. As a result of this extensive effort, not a single tactic mentioned earlier is completely reasonable. 
There must be a strategy that does not affect the evaluation criteria [38-41]. The use of machine 
learning for missing data imputation has been on the rise in recent years, and researchers have found 
that machine learning methods outperform traditional statistical methods (such as mean imputation, 
hot-deck, and multiple imputations) when it comes to handling missing data, resulting in better 
prediction accuracy of patient outcomes [42]. 


ANN-based deep learning was first proposed in the early 1980s but has only recently been put into 
practise due to the prohibitive costs of time and computational resources required by then-current 
hardware [43-49]. By making large datasets and graphics processing units (GPU) more widely 
available, deep learning is on a fast track to becoming a household name within a few years (Hinton & 
Salakhu 2006) [50-54]. When it comes to speech recognition, language processing, and image 
classification, deep learning outperforms traditional machine learning methods like support vector 
machine (SVM) and fuzzy domain [55]. Using deep learning, it is possible to forecast the deterrence, 
diagnosis, treatment, and prognosis of the cerebral disease because of its ability to conclude abstract, 
high-level depictions. Alzheimer's disease, dementia, and attention deficit hyperactivity disorder 
(ADHD) are all included in a small study that recycled deep learning to categorise the disorders 
involved. When applied to large, high-dimensional datasets, deep learning-based approaches have 
been shown to be effective at filling in the blanks of missing data [56-59]. 


The goal of the study is to use deep learning to fill in missing data in a medical dataset, as stated in the 
study's scope [60-75]. For data imputation, we will use deep learning in conjunction with the UCI 
Repository of Machine Learning Databases. Our total sample size (N=768) was increased by 
combining samples from previous studies, and missing data for required features was imputed using a 
deep learning approach [76-82]. We assume that deep learning is capable of filling in the blanks in the 
medical dataset and generating a large imputed dataset that is as close to the original as possible in 
terms of its ability to distinguish missing features [83]. It is possible to measure a machine's ability to 
learn from data and accurately predict missing values using this deep learning approach. Using 
imputed features, we hypothesised hyperactivity-impulsivity behaviours [84-92]. 


Related Work 


This is a framework of machine learning models used to detect anomalies and interferences in real- 
world systems [93-115]. For imputation, Chand and coworkers combined nine other machine learning 
models with SVM [116-119]. A group of well-known classifiers experimented with imputation of 
unknown values. On the basis of the host's user profiles constructed from normal usage data, dynamic 
behaviour models similar to the Hidden Markov Model were assumed to detect intrusions [120-125]. 
A hybrid model combining SVM, decision trees, and Nave Bayes was proposed in. For the 
classification of host states based on network traffic behaviour, Li et al. provided an online Support 
Vector Machine (SVM) with a decision tree. For missing value feeds, boosted SVM was used in [126- 
141]. In, Meng compared a variety of machine learning models, including artificial neural networks, 
SVM, and decision trees, to determine the best method for imputed missing values [142]. SVM and 
neural networks were employed to automatically compute the missing values [143]. To reduce the rate 
of false positives, fuzzy clustering was presented. In order to achieve scalable unsupervised intrusion 
detection, K-Means clustering was used [144-151]. 
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Deep Belief Network 


Deep Belief Network differs significantly from Deep Neural Network in two important respects: The 
network's structure [152-166]: A forward feed network with multiple hidden layers is known as a deep 
neural network [167]. The logistic/sigmoid activation function is typically customised by each hidden 
neuron [168]. Undirected connections between hidden layers of Restricted Boltzmann Machines are 
characteristic of the Deep Belief Network [169-171]. 


The backpropagation training process of a Deep Neural Network uses labelled data to adjust its 
weights [172-179]. Backpropagation is used to fine-tune the weights in the Deep Belief Network, 
which uses unsupervised pre-training by contrastive divergence and backpropagation [180]. Many 
balanced labelled datasets are required by Deep Neural Networks, but the maximum industry data 
shortage of such labels. Stacking Restricted Boltzmann Machines creates the Deep Belief Network, an 
unsupervised probabilistic multiplicative model [181-186]. The stacked Restricted Boltzmann 
Machines are able to process inputs using the CD procedure [187]. In this phase, labelled data is not 
needed because CD is unsupervised learning [188]. A supervised learning model, such as 
softmax/logistic regression or a linear classifier with gradient descent learning process, will be used in 
the second phase to adjust the pre-trained network [189-191]. This second phase only fine-tuned 
model input, as the Deep Belief Network inputs are nearly stable after CD [192-196]. As a result, Deep 
Belief Network requires fewer labelled data points [197]. 


Restricted Boltzmann Machine 


Stochastic model Restricted Boltzmann Machine (RBM) has two layers. Invisible and visible layers 
are part of it. Visible states can be found in the visible layer. V= (v1, ..., Vn), while concealed layer has 
states H = (h,...,h,) which can't be directly measured [198-199]. In between the two layers, all of the 
states are completely linked. States in the same layer are equally liberated, despite the fact that they 
have no connections to one another. One kind of energy-based model is the Restricted Boltzmann 
Machine. Each configuration of the variables of interest is accompanied by scalar energy. In order to 
form the necessary properties, one must regulate the energy function as part of the learning process. 
Equation (1), using an energy function, gives the joint probability distribution for (v, h) in a Restricted 
Boltzmann Machine. 


p((v,h) = =e Bn) (1) 
T 


he energy function E(v, h) is distinct in Equation (2), where wj;jsignifies the weight between concealed 
state h; and visible state vj. b; and aj are constant-valued offsets or partialities. In order to improve 
model performance, the bias is used to shift the energy function during training. 


Restricted Boltzmann Machine Training 


The central task for Restricted Boltzmann Machine training is to learn the weight matrix W = {wi} 
that takes full advantage of the log-likelihood log p(v). From Equations (2) and (3), we have, 


dE(v,h al 5 Le 
ty =vjh > E = (vjhi)° — (Vin) = Awij =E ( (vihi)? — (Vin?) (2) 
where (vjh;)t t=0, ... , © signifies the expectation of random variable vj h; at sampling step t and € is 


the learning rate for weight updating. Sampling of (vjh;}* successive a Markov chain can attain t to 
convergence through Gibbs choice. Visible states can be sampled instantly while assuming fixed 
values for concealed states in the Restricted Boltzmann Machine because the visible and concealed 
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states are conditionally liberated. Similarly, it is possible to simultaneously sample hidden states, given 
the visible states. So, assumed hj or vf at step t, hj*torvi** at step t + 1can be acquired by 
Equations (5) and (6). That is, hf** is set as 1 with probability o(b; + Dj vj wy), and v;** is set at 1 
with probability o(a; + di hj wij). As t> œ, the samples will be approaching true samples 


from p(v). Figure 1 illustrates this sampling process (figure 1). 
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Figure 1. Restricted Boltzmann Machine sampling process 


The sampling process can be accelerated using Contrastive Divergence (CD). To begin, rather than 
randomly starting the Markov chain, training samples that are closer to the true distribution are used, 
resulting in a faster convergence of the chain. First of all, the CD doesn't wait for the Markov chain to 
converge. After the k steps of Gibbs' choice, however, it will come to a halt. Even with a small k (in 
practise, k is often set to 1), the algorithm produces near-maximum likelihood solutions even when k is 
large. Structure of the Deep Belief Network the Deep Belief Network is a stochastic model. Stacking 
Restricted Boltzmann Machines is the primary way to build the Deep Belief Network shown in Figure 
2. It consists of a classifier and a stack of Restricted Boltzmann Machines. Pre-training and fine-tuning 
are part of the Deep Belief Network's training process (figure 2). 
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Figure 2: Architecture of Deep Belief Network 
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Experimentation and Result Outcome 
Pre-preparation 


To stack Restricted Boltzmann Machines, pre-training is required. There will be a visible layer in the 
next Restricted Boltzmann Machine that was hidden in the previous one. This is the pre-training 
procedure referred to as X, the input matrix. 


> Practice the first layer of Restricted Boltzmann Machine on X to gain the weight matrix using 
Contrastive Divergence system 


> Transmute X with the weight matrix to renovate different data X’. 
Fine-tuning 


Backpropagation and softmax regression are used to fine-tune the Deep Belief Network. (The 
uninspired chain principle on model errors uses backpropagation to regulate weights. The fault will 
propagate all the way up to the top of the stack. The size of the batch and the number of times it is run 
are two critical considerations. Fine-tuning is a process in which training samples are divided into 
equal-sized clutches. Samples in each clutch are fed to the dataset feature before weight updates can be 
made. The iterations of fine-tuning are linked to a number of epochs. More fine-tunings can be 
achieved with a smaller batch size or more periods (Gau et al., 2008). The conditional probability of 
Y=l (class 1) given the output X', coefficient matrix C, and intercept d of stacked Restricted Boltzmann 
Machines. The class with the greatest likelihood of finishing is selected as the winner. ( An extension 
to logistic regression known as Softmax regression, it can be used with more than two classes. 


Dataset 


This repository is a collection of databases, domain theories and data generators that are used by the 
machine learning community to test and evaluate algorithms. Attribute and instance types, names of 
data sets and default tasks are all part of the UCI Repository. 


Table 1. PIMA Dataset descriptions 


Data E aa Multivariate Mumbar Dl 768 | Area: Life 
Characteristics: Instances: 
Attribute ineoer Real Number of g Date 1990-05- 
Characteristics: : Attributes: Donated 09 

. ee Missin Number of 
Associated Tasks: Classification sant Yes Web Hits: 98164 
Training Data Set 16 


In Table 1, the PIMA Dataset of the UCI repository is used in conjunction with the Deep Belief 
Network system for 768 instances. Multivariate, Integer, and Real are all included in the PIMA 
Dataset. Converging the,w-ij. weight offset assumed by the data set was an insistence of the dataset. 
,w-ij has nine missing features, and the trained dataset was regulated for imputation nearly 8 times to 
find the,w-ij using these missing features. Figure 3 shows that four different types of value imputation 
were used for a single attribute's outcomes. It's DBN, Fuzzy Clustering, Penalized FCM (black 
corporates), and Statistical Mean (blue labels) (figure 3). 
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Imputation Accuracy Rate 
w A wu oa N 
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Figure 3. Assessment of imputation Accuracy 


Only one figure is shown here. To a maximum of 90% (i.e. for attribute 3), the blue label of the Deep 
Belief Network approach achieves value imputation, with 10% of the total instances N=768 failing. 
Testing also implemented an interesting fact: the value imputation was limited to 65 percent, and the 
failed cases were limited to 35 percent. The remaining imputation approaches were red label 61 
percent, green label 70 percent, and black label 70 percent. 


Conclusion 


Unsupervised machine learning Deep Belief Network is used to accomplish two main goals. An 
imputation for a missing place is the first and most important step. Improving imputation accuracy and 
stability are the primary goals of the Deep Belief Network procedure, which uses a backpropagation 
approach. Ninety percent of the imputations were accurate. To achieve high imputation accuracy, the 
medical dataset attributes are trained separately using machine learning techniques. CD plays a critical 
role in defining the stability of the imputation accuracy. Using a mapping of the work, pre-trained 
classifiers have been applied to the dataset. There are 768 different attributes associated with the 
dataset, and three of these characteristics are enforced. Depending on the other attribute, the attribute 
imputation ratio changes. The next step will be to reduce the inference by 10% in order to reduce the 
percentage of imputation in the future. 
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